Groundtruth Image Generation from Electronic Text (Demonstration)

نویسنده

  • David Doermann
چکیده

The problem of generating synthetic data for the training and evaluating of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed an approach that uses language support of the MSWindows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives. The Metafile information is parsed to generated zone, line, word, and character groundtruth including location, font information and content in any language supported by Windows. The processing is embedded in a collection of tools for data generation, groundtruthing, degradation and evaluation. The discussion here focuses on the Groundtruth Generator.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TRUEVIZ: a groundtruth/metadata editing and visualizing toolkit for OCR

Tools for visualizing and creating groundtruth and metadata are crucial for document image analysis research. In this paper, we describe TrueViz which is a tool for visualizing and editing groundtruth/metadata for OCR. TrueViz is implemented in the Java programming language and works on various platforms including Windows and Unix. TrueViz reads and stores groundtruth/metadata in XML format, an...

متن کامل

Improvement of generative adversarial networks for automatic text-to-image generation

This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...

متن کامل

The Architecture of Trueviz: a Groundtruth/metadata Editing and Visualizing Toolkit the Architecture of Trueviz: a Groundtruth/metadata Editing and Visualizing Toolkit the Architecture of Trueviz: a Groundtruth/metadata Editing and Visualizing Toolkit

Tools for visualizing and creating groundtruth and metadata are crucial for document image analysis research. In this paper we describe TrueViz [LK00, KLCB01], which is a tool for visualizing and editing groundtruth/metadata. We rst describe the groundtruthing task and the requirements for any interactive groundtruthing tool. Next we describe the system design of TrueViz and discuss how a user ...

متن کامل

The architecture of TRUEVIZ : A groundTRUth /

Tools for visualizing and creating groundtruth and metadata are crucial for document image analysis research. In this paper we describe TrueViz LK00, KLCB01], which is a tool for visualizing and editing groundtruth/metadata. We rst describe the groundtruthing task and the requirements for any interactive groundtruthing tool. Next we describe the system design of TrueViz and discuss how a user c...

متن کامل

Automatic Generation of Character Groundtruth for Scanned Documents: A Closed-Loop Approach - Pattern Recognition, 1996., Proceedings of the 13th International Conference on

Character groundtruth for scanned document images as crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not possible because (a) accuracy an delineating groundtruth character bounding boxes is not high enough, (ii)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003